Getting Started with earthaccess

Analyzing Sea Level Rise

Outline notes:

  • Three lines of code to get the data

  • More details on spatial search options, temporal search, keyword search

    • Emphasize reproducibility
    • earthaccess as a complement to other search tools like Earthdata Search
  • Show both download to local machine and s3 access: Same code

  • Coming features / roadmap

    • earthaccess.explore()
      • Even though .explore() won’t be fully merged, it’s still good to show upcoming features
    • Kerchunk
  • Call to action

    • Make it clear that this is a community effort; highlight what’s already been done

Getting Started with earthaccess

Analyzing Sea Level Rise

Overview

This tutorial analyzes global sea level rise from satellite altimetry data and compares it to a historic record. We will be reproducing the plot from this infographic: NASA-led Study Reveals the Causes of Sea Level Rise Since 1900.

Learning Objectives

  • Search for data programmatically using keywords or datasets’ concept_id
  • Access data using the earthaccess python library
  • Visualize sea level rise trends from altimetry datasets and compare against historic records

Requirements

1. Compute environment

  • This notebook can run anywhere thanks to earthaccess!

2. Earthdata Login

An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.

Import (or Install) Packages

import logging
logging.basicConfig(level=logging.INFO,
                    force = True)

try:
    import earthaccess
    import xarray as xr
    from pyproj import Geod
    import numpy as np
    import hvplot.xarray
    from matplotlib import pyplot as plt
    from pprint import pprint
    import panel as pn
    import panel.widgets as pnw
except ImportError as e:
    logging.warning("installing missing dependencies... ")
    %pip install -q earthaccess matplotlib hvplot pyproj xarray numpy h5netcdf panel
finally:
    import earthaccess
    import xarray as xr
    from pyproj import Geod
    import numpy as np
    import hvplot.xarray
    from matplotlib import pyplot as plt
    from pprint import pprint
    import panel.widgets as pnw
    logging.info("Dependencies imported")
    
INFO:root:Dependencies imported

earthaccess and NASA’s EDL

We recommend authenticating your Earthdata Login (EDL) information using the earthaccess python package as follows:

auth = earthaccess.login()

Search for Sea Surface Height Data

Let’s find the first four collections that match a keyword search for Sea Surface Height and print out the first two.

max_results = 10

collections = earthaccess.search_datasets(
    keyword = "SEA SURFACE HEIGHT",
    cloud_hosted = True,
    count = max_results
)

# Let's print 2 collections
for collection in collections[0:2]:
    # pprint(collection.summary())
    print(pprint(collection.summary()), collection.abstract(), "\n", collection["umm"]["DOI"], "\n\n")
Datasets found: 245
{'cloud-info': {'Region': 'us-west-2',
                'S3BucketAndObjectPrefixNames': ['podaac-ops-cumulus-public/SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205/',
                                                 'podaac-ops-cumulus-protected/SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205/'],
                'S3CredentialsAPIDocumentationURL': 'https://archive.podaac.earthdata.nasa.gov/s3credentialsREADME',
                'S3CredentialsAPIEndpoint': 'https://archive.podaac.earthdata.nasa.gov/s3credentials'},
 'concept-id': 'C2270392799-POCLOUD',
 'file-type': "[{'Format': 'netCDF-4', 'FormatType': 'Native', "
              "'AverageFileSize': 9.7, 'AverageFileSizeUnit': 'MB'}]",
 'get-data': ['https://cmr.earthdata.nasa.gov/virtual-directory/collections/C2270392799-POCLOUD',
              'https://search.earthdata.nasa.gov/search/granules?p=C2270392799-POCLOUD'],
 'short-name': 'SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205',
 'version': '2205'}
None This dataset provides gridded Sea Surface Height Anomalies (SSHA) above a mean sea surface, on a 1/6th degree grid every 5 days. It contains the fully corrected heights, with a delay of up to 3 months. The gridded data are derived from the along-track SSHA data of TOPEX/Poseidon, Jason-1, Jason-2, Jason-3 and Jason-CS (Sentinel-6) as reference data from the level 2 along-track data found at https://podaac.jpl.nasa.gov/dataset/MERGED_TP_J1_OSTM_OST_CYCLES_V51, plus ERS-1, ERS-2, Envisat, SARAL-AltiKa, CryoSat-2, Sentinel-3A, Sentinel-3B, depending on the date, from the RADS database. The date given in the grid files is the center of the 5-day window. The grids were produced from altimeter data using Kriging interpolation, which gives best linear prediction based upon prior knowledge of covariance. 
 {'DOI': '10.5067/SLREF-CDRV3', 'Authority': 'https://doi.org'} 


{'cloud-info': {'Region': 'us-west-2',
                'S3BucketAndObjectPrefixNames': ['podaac-ops-cumulus-protected/ALTIKA_SARAL_L2_OST_XOGDR/',
                                                 'podaac-ops-cumulus-public/ALTIKA_SARAL_L2_OST_XOGDR/'],
                'S3CredentialsAPIDocumentationURL': 'https://archive.podaac.earthdata.nasa.gov/s3credentialsREADME',
                'S3CredentialsAPIEndpoint': 'https://archive.podaac.earthdata.nasa.gov/s3credentials'},
 'concept-id': 'C2251465126-POCLOUD',
 'file-type': "[{'Format': 'netCDF-4', 'FormatType': 'Native'}]",
 'get-data': ['https://search.earthdata.nasa.gov/search/granules?p=C2251465126-POCLOUD',
              'https://cmr.earthdata.nasa.gov/virtual-directory/collections/C2251465126-POCLOUD'],
 'short-name': 'ALTIKA_SARAL_L2_OST_XOGDR',
 'version': 'f'}
None These data are near-real-time (NRT) (within 7-9 hours of measurement) sea surface height anomalies (SSHA) from the AltiKa altimeter onboard the Satellite with ARgos and ALtiKa (SARAL).  SARAL is a French(CNES)/Indian(SARAL) collaborative mission to measure sea surface height using the Ka-band AltiKa altimeter and was launched February 25, 2013. The major difference between these data and the Operational Geophysical Data Record (OGDR) data produced by the project is that the orbit from SARAL has been adjusted using SSHA differences with those from the OSTM/Jason-2 GPS-OGDR-SSHA product at inter-satellite crossover locations. This produces a more accurate NRT orbit altitude for SARAL with accuracy of 1.5 cm (RMS), taking advantage of the 1 cm (radial RMS) accuracy of the GPS-based orbit used for the OSTM/Jason-2 GPS-OGDR-SSHA product. This dataset also contains all data from the project (reduced) OGDR, and improved altimeter wind speeds and sea state bias correction. More information on the SARAL mission can be found at: http://www.aviso.oceanobs.com/en/missions/current-missions/saral.html 
 {'DOI': '10.5067/AKASA-XOGD1', 'Authority': 'https://doi.org'} 

Access Data

The first dataset looks like it contains data from many altimetry missions, let’s explore a bit more! We will access the first granule of the SEA_SURFACE_HEIGHT_ALT_GRIDS_L4_2SATS_5DAY_6THDEG_V_JPL2205 collection in the month of May for every year that data is available and open the granules via xarray.


granules = []

for year in range(1990, 2020):
    print(f"Searching for data in {year}")
    granule = earthaccess.search_data(doi="10.5067/SLREF-CDRV3",
                                      temporal= (f"{year}-05", f"{year}-06"),
                                      count = 1)
    if len(granule)>0:
        granules.append(granule[0])
print(f"Total granules: {len(granules)}")     
Searching for data in 1990
Granules found: 0
Searching for data in 1991
Granules found: 0
Searching for data in 1992
Granules found: 0
Searching for data in 1993
Granules found: 6
Searching for data in 1994
Granules found: 6
Searching for data in 1995
Granules found: 6
Searching for data in 1996
Granules found: 7
Searching for data in 1997
Granules found: 7
Searching for data in 1998
Granules found: 7
Searching for data in 1999
Granules found: 7
Searching for data in 2000
Granules found: 7
Searching for data in 2001
Granules found: 7
Searching for data in 2002
Granules found: 7
Searching for data in 2003
Granules found: 7
Searching for data in 2004
Granules found: 6
Searching for data in 2005
Granules found: 6
Searching for data in 2006
Granules found: 6
Searching for data in 2007
Granules found: 6
Searching for data in 2008
Granules found: 6
Searching for data in 2009
Granules found: 6
Searching for data in 2010
Granules found: 6
Searching for data in 2011
Granules found: 6
Searching for data in 2012
Granules found: 6
Searching for data in 2013
Granules found: 6
Searching for data in 2014
Granules found: 6
Searching for data in 2015
Granules found: 6
Searching for data in 2016
Granules found: 7
Searching for data in 2017
Granules found: 7
Searching for data in 2018
Granules found: 7
Searching for data in 2019
Granules found: 7
Total granules: 27
%%time

ds_SSH = xr.open_mfdataset(earthaccess.open(granules), chunks={})
ds_SSH
Opening 27 granules, approx size: 0.24 GB
CPU times: user 4.63 s, sys: 668 ms, total: 5.3 s
Wall time: 1min 55s
<xarray.Dataset>
Dimensions:      (Time: 27, Longitude: 2160, nv: 2, Latitude: 960)
Coordinates:
  * Longitude    (Longitude) float32 0.08333 0.25 0.4167 ... 359.6 359.8 359.9
  * Latitude     (Latitude) float32 -79.92 -79.75 -79.58 ... 79.58 79.75 79.92
  * Time         (Time) datetime64[ns] 1993-05-03T12:00:00 ... 2019-05-02T12:...
Dimensions without coordinates: nv
Data variables:
    Lon_bounds   (Time, Longitude, nv) float32 dask.array<chunksize=(1, 2160, 2), meta=np.ndarray>
    Lat_bounds   (Time, Latitude, nv) float32 dask.array<chunksize=(1, 960, 2), meta=np.ndarray>
    Time_bounds  (Time, nv) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray>
    SLA          (Time, Latitude, Longitude) float32 dask.array<chunksize=(1, 960, 2160), meta=np.ndarray>
    SLA_ERR      (Time, Latitude, Longitude) float32 dask.array<chunksize=(1, 960, 2160), meta=np.ndarray>
Attributes: (12/21)
    Conventions:            CF-1.6
    ncei_template_version:  NCEI_NetCDF_Grid_Template_v2.0
    Institution:            Jet Propulsion Laboratory
    geospatial_lat_min:     -79.916664
    geospatial_lat_max:     79.916664
    geospatial_lon_min:     0.083333336
    ...                     ...
    version_number:         2205
    Data_Pnts_Each_Sat:     {"16": 780190, "1001": 668288}
    source_version:         commit dc95db885c920084614a41849ce5a7d417198ef3
    SLA_Global_MEAN:        -0.027657876081274502
    SLA_Global_STD:         0.0859016072308609
    latency:                final

Plot Sea Level Anomalies


time = pnw.Player(name='Time', start=0, end=len(ds_SSH.Time)-1, loop_policy='loop', interval=1000)

ds_SSH.SLA.interactive(loc='bottom', aspect="equal").isel(Time=time).hvplot(cmap="inferno", data_aspect=1)

Recreate the Sea Level Rise Infographic

First, we define a function that will calculate the area in 1/6 by 1/6 degree boxes in order to calculate the global mean of the SSH later.

def ssl_area(lats):
    """
    Calculate the area associated with a 1/6 by 1/6 degree box at latitude specified in 'lats'.
    
    Parameter
    ==========
    lats: a list or numpy array of size N the latitudes of interest. 
    
    Return
    =======
    out: Array (N) area values (unit: m^2)
    """
    # Define WGS84 as CRS:
    geod = Geod(ellps='WGS84')
    dx=1/12.0
    # create a lambda function for calculating the perimeters of the boxes
    c_area=lambda lat: geod.polygon_area_perimeter(np.r_[-dx,dx,dx,-dx], lat+np.r_[-dx,-dx,dx,dx])[0]
    out=[]
    for lat in lats:
        out.append(c_area(lat))
    return np.array(out)

Let’s use the function on our Sea Surface Height dataset.

# note: they rotated the data in the last release, this operation used to be (1,-1)
ssh_area = ssl_area(ds_SSH.Latitude.data).reshape(-1,1)
print(ssh_area.shape)
(960, 1)

Next, we find and open the historic record dataset also using earthaccess and xarray.

historic_ts_results = earthaccess.search_data(short_name='JPL_RECON_GMSL')
historic_ts=xr.open_mfdataset(earthaccess.open([historic_ts_results[0]]), engine='h5netcdf')
historic_ts
Granules found: 1
Opening 1 granules, approx size: 0.0 GB
<xarray.Dataset>
Dimensions:                                             (time: 119)
Coordinates:
  * time                                                (time) datetime64[ns] ...
Data variables: (12/21)
    global_average_sea_level_change                     (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    global_average_sea_level_change_upper               (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    global_average_sea_level_change_lower               (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    glac_mean                                           (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    glac_upper                                          (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    glac_lower                                          (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    ...                                                  ...
    global_average_thermosteric_sea_level_change        (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    global_average_thermosteric_sea_level_change_upper  (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    global_average_thermosteric_sea_level_change_lower  (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    sum_of_contrib_processes_mean                       (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    sum_of_contrib_processes_upper                      (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
    sum_of_contrib_processes_lower                      (time) float32 dask.array<chunksize=(119,), meta=np.ndarray>
Attributes: (12/42)
    title:                     Global sea-level changes and contributing proc...
    summary:                   This file contains reconstructed global-mean s...
    id:                        10.5067/GMSLT-FJPL1
    naming_authority:          gov.nasa.jpl
    source:                    Frederikse et al. The causes of sea-level rise...
    project:                   NASA sea-level change science team (N-SLCT)
    ...                        ...
    time_coverage_start:       1900-01-01
    time_coverage_end:         2018-12-31
    time_coverage_duration:    P119Y
    time_coverage_resolution:  P1Y
    date_created:              2020-07-28
    date_modified:             2020-09-14

Let’s Plot!

%%time

plt.rcParams["figure.figsize"] = (16,4)

fig, axs = plt.subplots()
plt.grid(True)

#function to get the global mean for plotting
def global_mean(SLA, **kwargs):
    dout=((SLA*ssh_area).sum()/(SLA/SLA*ssh_area).sum())*1000
    return dout

result = ds_SSH.SLA.groupby('Time').apply(global_mean)

plt.xlabel('Time (year)',fontsize=16)
plt.ylabel('Global Mean SLA (meter)',fontsize=12)
plt.grid(True)

result.plot(ax=axs, color="orange", marker="o", label='Satellite Record')

historic_ts['global_average_sea_level_change'].plot(ax=axs, label='Historical in-situ record', color="lightblue")

plt.legend()
plt.show()

CPU times: user 4.02 s, sys: 918 ms, total: 4.94 s
Wall time: 3.28 s

This Data Story is based on Jinbo Wang’s Earthdata Webinar tutorial.